Speeding up parallel GROMACS on high-latency networks

نویسندگان

  • Carsten Kutzner
  • David van der Spoel
  • Martin Fechner
  • Erik Lindahl
  • Udo W. Schmitt
  • Bert L. de Groot
  • Helmut Grubmüller
چکیده

We investigate the parallel scaling of the GROMACS molecular dynamics code on Ethernet Beowulf clusters and what prerequisites are necessary for decent scaling even on such clusters with only limited bandwidth and high latency. GROMACS 3.3 scales well on supercomputers like the IBM p690 (Regatta) and on Linux clusters with a special interconnect like Myrinet or Infiniband. Because of the high single-node performance of GROMACS, however, on the widely used Ethernet switched clusters, the scaling typically breaks down when more than two computer nodes are involved, limiting the absolute speedup that can be gained to about 3 relative to a single-CPU run. With the LAM MPI implementation, the main scaling bottleneck is here identified to be the all-to-all communication which is required every time step. During such an all-to-all communication step, a huge amount of messages floods the network, and as a result many TCP packets are lost. We show that Ethernet flow control prevents network congestion and leads to substantial scaling improvements. For 16 CPUs, e.g., a speedup of 11 has been achieved. However, for more nodes this mechanism also fails. Having optimized an all-to-all routine, which sends the data in an ordered fashion, we show that it is possible to completely prevent packet loss for any number of multi-CPU nodes. Thus, the GROMACS scaling dramatically improves, even for switches that lack flow control. In addition, for the common HP ProCurve 2848 switch we find that for optimum all-to-all performance it is essential how the nodes are connected to the switch's ports. This is also demonstrated for the example of the Car-Parinello MD code.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speeding up the Stress Analysis of Hollow Circular FGM Cylinders by Parallel Finite Element Method

In this article, a parallel computer program is implemented, based on Finite Element Method, to speed up the analysis of hollow circular cylinders, made from Functionally Graded Materials (FGMs). FGMs are inhomogeneous materials, which their composition gradually varies over volume. In parallel processing, an algorithm is first divided to independent tasks, which may use individual or shared da...

متن کامل

Gromita: A Fully Integrated Graphical User Interface to Gromacs 4

Gromita is a fully integrated and efficient graphical user interface (GUI) to the recently updated molecular dynamics suite Gromacs, version 4. Gromita is a cross-platform, perl/tcl-tk based, interactive front end designed to break the command line barrier and introduce a new user-friendly environment to run molecular dynamics simulations through Gromacs. Our GUI features a novel workflow inter...

متن کامل

GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit

MOTIVATION Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to mo...

متن کامل

2: Characteristics of Gigabit Networks

Advances in communication rates exceed the ability of a single source to fully utilize a gigabit WAN channel with existing deterministic protocols. Parallel communication describes a method for reducing latency by managing indeterminism and increasing channel utilization, given a surplus bandwidth-delay product. It involves a nondeterministic state mechanism, with a modified protocol interface....

متن کامل

Communication Latency Hiding | Model and Implementation in High-latency Computer Networks Communication Latency Hiding Model and Implementation in High-latency Computer Networks

The potential of large numbers of workstations for solving very large problems is tremendous. Nevertheless, it is often considered inappropriate to parallelize applications with a fair amount of communication on computer networks, because communication via networks with high latency and low bandwidth presents a technological bottleneck. In this paper, a model to analyze the gain of communicatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of computational chemistry

دوره 28 12  شماره 

صفحات  -

تاریخ انتشار 2007